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ABSTRACT 

Computer ass is ted language learning (CALL) packages 
offer the majority of students who are learning English as a foreign 
language the opportunity for individual instruction. To meet the 
needs of an individual student, an adaptive CALL environment must 
have a dynamic model of student performance, a means of varying the 
difficulty of the learning task, and a mapping between student 
competence and task complexity. There are two main types of 
user-adaptive interfaces for language learning: discrete-step 
interfaces and continuously variable interfaces. Before designers can 
build CALL systems that "understand" their users, they must be able 
to analyze the interactions between the user and the computer in the 
language learning task. Language learning skills may be divided into 
the categories of lexical skills, syntactical skills and discourse 
skills. The first task in recording and measuring student performance 
is to devise a user profile; the second task is to ensure a continuum 
of exercises. Once the student has decided what linguistic skill to 
work on, there are four stages to the exercise generation process: 
(1) determining a suitable source; (2) choosing suitable passages; 
(3) selecting from these passages examples most suited to user needs; 
and (4) generating the electronic version of the exercise. The format 
of the exercise used in this study was found to be generally 
effective; however it is unclear to what degree there is a 
correlation between readability grade and exercise difficulty. 
(AEF) 
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^ Abstract: To meet the needs of an individual student an adaptive CALL package must have a 

clyniiinic mode! of student performance, a means of varying the difficulty of the learning task, and a 
mapping between student competence and task complexity. This paper analyses how these 
components can be implemented for lexical, syntactical and discourse skills using domain 
knowledge from the Oxford Advanced Learner's Dictionary and the Susanne Corpus, a fully tagged 
subset of the Brown Corpus. 

Rationale for Computer Assisted Language Learning 

Siudenis who are learning English as a Foreign Language in an English-speaking country form a multi- 
cultural group with disparate motivations and goals, whose diversity cannot be adequately accommodated in a 
traditional classi^ooiii. There is no rate of imparting information nor sequence of instruction that will meet the 
needs of every student. Teachers can only construct in their mind a model of the 'average' student and have as 
their goal the adc(juate progress of this hypothetical student; frequently they are guiltily aware that the more 
able will be bored while the less able will be lost. Today computerised language learning packages offer the 
majoritv of students their only realistic opportunity for individual tuition. Even with the most mundane of 
CALL packages, each student can essay an answer to every question, repeat lessons not fully comprehended or 
skip lessons that merely rehearse previously acquired skills. A CALL package which permits a student to make 
this sort of choice is sometimes referred to as a user-tailored system, 

vSclf-diiected learning of this kind has been shown to benefit mature learners, that is, experienced students 
of proven academic competence (Kearsley & Hillelsohn, 1982), who are capable of determining their own 
educational needs. However. less experienced or less able students may not be able to make an accurate 
assesstiient of their own shortcomings or to devise for themselves a coherent and comprehensive study plan. 
For these students, a CALL package can be immeasurably improved by a user-adaptive interface i.e. an 
interface where adaptations in the order and pace of learning are made by the system, not by the student. 

User-Adaptive Interfaces 

A us( r-aclaptivc interface is one which can change its behaviour automatically in response to its experience 
of user performance, that is, it changes to suit the skills and knowledge of an individual student. The system 
designer or language teacher no longer tries to construct an interface for a .stereotypical average user, to which 
no real learner conforms. Instead, the designer accepts that no single learning theory pattern is suitable for all 
students ~ or even for one .student over a range of skills or a period of time - and realises that the teaching 
package must adapt to the student's varying abilities in different skills by providing information and exercises at 
a level which inatches the student's current needs. To this end an adaptive learning package must have three 
components: 

• a means of recording and measuring individual student performance. This could also be described as a 
dynamic inodel of the user's past performance and current capability. This dimension of capability is inevitabi) 
continuous rather than discrete. 
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• ii means of adjusting the learning task so as to change its dit'Hculty. In the context of language learning this 
might simply be the ability to offer appropriate fielp and linguistic explanations to students of various levels of 
ability and to set for each student exercises commensurate with the student's expected performance. 

• mappings of help level onto student capability and of student capability on to degree of difficulty. How these 
mappings are achieved determines the type of user-adaptive i^iterface. 

Types of User-Adaptive Interfaces 

There are two main types of user-adaptive interfaces for language learning: discrete-step interfaces and 
continuously variable interfaces. 

Discrete-step interfaces 

A discrete step interface is an interface which identifies the user as a member of a particular ability group 
and sets the interface to correspond with the skill level for the group. This, while more satisfactory than a 
general user interface, does not allow for continuous variation in student capability but assumes discrete ability 
levels. In language learning it is a practical solution for the organisation of expert procedural knowledge, which 
is the distillation of the rules that govern the lexical, syntactical, and discourse structures of the language. The 
traditional presentation format for this knowledge has been textbooks and reference books. Such books might be 
aimed at a specific category of language learner e.g. novice, intermediate, expert, or might be an exhaustive 
exposition of all facets of a language for scholars of comparative linguistics and languages. Knowledge of this 
kind has never formed a continuum: each text represents the independent view of one teacher or expert on the 
information needed by an average learner at a typical stage in the learning process. The order in which the 
information is presented may be said to represent the author's strategy for how best to accomplish the learning! 
task. Computerised learning packages, particularly those implemented in hypertext, allow us to separate 
monolithic texts into fragments and to convert these fragments into many different virtual structures. This is 
useful for accommodating students whose ability over different linguistic skills varies widely: in one skill a 
student may be given a simple explanation from first principles while in another the same student may be iiiven 
only a brief hint or reminder because past performance indicates a consistently high level of ability. In a^skill 
where the student's ability is intermediate, extra information and/or explanation of more abstruse points may be 
provided. But. however skilful the dissection of the information and however varied the virtual knowledue 
structures in which it can be stored, it is not possible for the computer, to re-write or even re-phrase the expert 
knowledge to cater uniquely for the needs of an individual student as a human teacher might be able to do. 
Thus, the interface for expert procedural knowledge remains a discrete-step interface. 

Continuously variable interface 

Although levels of procedural knowledge are discrete the domain knowledge in any natural language is, of 
course, a continuum. Domain knowledge consists of the accumulated oral and written records that are accepted 
by the native speakers of this language. While the documents that comprise these sources may in them.selves be 
as discrete as the expositions of procedural knowledge, together they form a continuum that represents every 
aspect of language use in every sphere of human activity. That this is so, is recognised by the accumulation of 
corpora - collections of texts from different genre assembled to enable language .scholars to investigate the 
evolution and. current state of a language. The remainder of this paper will discuss how corpora may be used to 
build an interface which uniquely adapts to the linguistic needs of an individual student in the generation of 
oxorci.ses to improve the student's linguistic skills. As a first step, it will look at ways of recording and 
measuring student performance. 

Recording and Measuring Student Performance 

Before designers can buiid CALL .systems that 'understand' their users, they must be able to analy.se the 
interactions between the user and the computer in the language learning task. This means that they must be able 
lo specify the skills that make up the learning task and must have ways of measuring .student performance in 
individual skills. Language learning skills may be divided into three major skills categories: 
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Lexical .skills appertain lo v-'ords: 

Syiiuiniral skills areihe skills needed to produce grammatically correct phrases or sentences. 

Disamrsc skills are the skills needed to write coherent and cohesive text. 
Obviously ail these skills overlap but nevertheless they provide the parameters for creating a studeru 
performance model. The parameters are given values which indicate the student's past performance m the 
.elevant skill. The student's attainment can be assessed simply by measuring the student success m Imguist.c 
exercises to assess and improve the skill. In this way. student competence can be viewed as a continuum. The 
student exercises should, of course, be set at a level which is appropriate to the student's current attainment so 
that every student is stretched to the limit of current ability: competent students should not be bored, weaker 
students should not be intimidated. 

An initial u.ser profile 

The Itrst task is to devise an initial user profile. This can be done simply by using a standard model for 
novice intermediate, or expert. This model will rapidly self-adjust to give a more accurate assessment of 
pn.ticiencv in each skill as the student u.ses the package. A further refinement is to mcorporate mto the student 
model some element that refieds the average performance in a particular skill of students from different 
laniiuaiie gioups. 

Easuring a Continuum of Exercises 

The second task is to ensure a continuum of exercises. This is only possible if the CALL package has the 
ability to assi.Mi to an exercise a parameter which gives a measure of its expected difficulty. How this can be 
done'depends upon the form of the domain knowledge on which the exercise is based. There are two mam 
lonns; continuous text and pre-selected passages, sentences or phrases that have been chosen by experts to 
illustrate precise linguistic features. 
• Continuous text 

A simple way to assign a coefficient of difficulty to continuous text is to use one (or more) of the methods 
that have been developed to assess readability grades; the more difficult a passage is to read, the more 
demanding should be any exercise based on it. Two common measures of readability are: Gunnmg Fog Index 
(Gunning. 1952) and Information Density (Wainwright, 1984) 

Fi.jure I shows Fog Indices and Information Densities for texts in the Susanne Corpus, (Sampson, 1992). The 
coefficients obtained from these two methods do not .'.Iways correlate and other factors must also be taken mto 
account when selecting material. However, either method gives a continuous scale of difficulty which can be 
correlated against student proficiency. 
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Figure 1. Gunning's Fog Index and Information Density coefficients for texts in the Susanne Corpus 

• Pre-selected passages 

A general text database is not always a suitable source for language exercises, particularly for advanced 
lexical exercises, e.g. differentiating frequently confused words such as earl\\ soon. There are two reasons: 

• there is no guarantee that the word has been correctly used: 

• the database may not contain enough usage examples to create a worthwhile exercise. 
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A more appropriate source for lexical exercises is a dictionary. Here the usage coverage for every word is 
exhaustive and the quality of the examples is assured. However because the examples consist almost cntirel> of 
sin-le sentences or phrases, it is not possible to use Gunning's Fog Index or Information Density to assess 
difficulty. For exercises where there is an abundance of material (e.g. verb particles) other methods can be used 
to choose appropriate examples. One of these is based on word frequency counts, e.g. use only verbs that are in 
the most frequently used lOOO words, 1500 words. 2000 words and so on. The word counts may be determined 
either from general literature or from texts relating to the student^s primary discipline: the latter will increase 
student motivation. (Sometimes, even in a dictionary, usage examples may be inadequate.) 

Hence, the type of exercise and the preferred source for an exercise of that type both influence 
implementation algorithms. 

Implementation of the Exercise Generation Package 

Exercise generation is dynamic. Once the student has decided what linguistic skill to work on, there are 
four stages to the generation process: 

Determine a suitable source text 

Retrieve passages which illustrate the required linguistic feature 

Sjeve the retrieved examples to leave those examples most suited to the needs of the current user. 
Generate the electronic version of the exercises. 

Determining a suitable source 

Initially, only two text sources, or knowledge domains, were readily available to this project, - the Oxford 
Advanced Learner's Dictionary (OALD) (Oxford. 1974) and the Susanne Corpus (Sampson, 1992). 
Consequently, it was decided simply to have a two-column table of skill against source to determine the more 
appropriate knowledge domain. If the source assigned is the Susanne Corpus (i.e. a continuous text source), the 
program uses the parameter for student competence in that skill to compute a commensurate readability grade 
and then selects the Susanne text that matches that grade most closely. As the number of sources available to 
the project mcreases, a more complex algorithm may be needed. A project priority is to investigate automatic 
parsmg so that texts directly related to a student's prime discipline can be used by the exercise cenerator. This 
shoulo increase student motivation by making exercises more directly relevant to student need. 

Choosing suitable passages 

Once a source has been selected, there are two main methods of retrieving appropriate examples: direct 
lexical search and searching for tags. 

Direct lexical search 

This is the more straightforward search and needs little explanation. The program searches the selected .source 
texts for occurrences of a precise siring of letters, usually a word, occasionally a word^stem. The technique can 
be used with tagged or untagged sources. An indexed text affords a considerable crease in speed of retrieval. 

• Searching for tags 

Complicating use of tagged sources is that every major database uses a different tagging system. Hence the 
Imgui.st.c parameters for retrieval and even the retrieval algorithm vary with source. Consequents, rather than 
cxplam algorithms m detail, it might be more generally useful to compare the two main tagging svstcms 
encountered in this project and highlight features of each that are particularly valuable in exercise selection. 

• Susanne Corpus 

The Susanne Corpus is a 128,000 word subset of the Brown Corpus (Francis, 1989) comprising 64 files, each of 
more than 20(X) words from four Brown genres: 

A press reportage j jearned (mainly scientific and technical writing) 

G Belle Icttres, biography, memoirs N adventure and Western fiction 
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The texts in Susanne have been manually analvsed and annotated in a way that gives acce.ss to both surface and 
looical structure. Fiuure 2 shows a portion of text from SusSnne. The most valuable fields m Susanne for 
exercise extraction aa> the ^u,rd and Icnuna fields for findi.ig examples for lexical exercises; the ^u,rd,a^ held 
for tiiidinj: occurrences of syntactical features and for finding discourse frameworks and. occasically, ihc.word 
field for lexical queries. The parse field has not yet proved useful for selecting discourse passages, but 
structural complexity may give another measure of degree of difficulty of a text. 
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Figure 2. Tagged text from the Susanne Corpus 



Oxford Advanced Learner's Dictionary 
In contrast with the Susanne Corpus, the example sentences and phrases in the OALD are not parsed. Thus, it is 
sometimes difftculi to use these examples for the automatic generation of syntactical exercises. Tagging in the 
OALD is primarily associated with the headword for each dictionary entry, or. where a headword has several 
senses with each sense. Grammatical information is detailed and includes plurals, comparatives and 
superlatives, idiomatic expressions and phrasal verbs are fully enumerated and well defined. Words and 
expiessions that fall into specialist English registers (eg accounts, aerospace, algebra, etc) are labelled. Every 
verb is classified by how it can be used, e.g. 

IVPIKAI S + vt + noun/pronoun + infinitive 
/ felt the house shake 

Many of the verbs can take several patterns and these are listed together after the sense number: e.g. 

vp = 6A, 8,9, 10, 18A. 19A.24. 
then follows the definition and then the example sentences. Unfortunately there is not an explicit link between a 
verb pattern and the instantiation of that pattern. Consequently, it is difficult to extract an illustrative sentence 
autt>niaiically. 

Sieving the retrieved examples 

Retrieved sentences are sieved for two purposes: 

• to remove as tar as possible all sentences that may lend themselves to more than one interpretation, e.g. m an 
exercise on pronouns it should be quite clear from the context which pronoun to use. 

• to retrieve only enough examples to create an adequate exercise. Selecting sentences at random from those 
retrieved ensures that students will be given a different exercise every time they rehearse that linguistic skill; 
however, if the exercise has not already been graded for difficulty, the use of word frequency counts is 
preferable. 

Generating the exercises 

Where appropriate the retrieved examples are re-ordered. Most exercises are generated as Cloze exercises 
that can be completed by the student using only a pointing device, e.g. a mouse. This has the disadvantage that 
it only tests recognition, not recall, but it requires little manual dexterity so the student can do it quickly. The 
hypertext templates used are described in (Wilson, 1992). Figures 3 shows exercises on pronouns generated 
from the Susanne Corpus; figure 4 shows a specialist vocabulary exercise for a student of architecture generated 
from the OALD. 
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Complete the following sentences by choosing the correct pronouns: 

he him himself she her herself 
it it itself they them themselves 

They saw it before did, even with my binoculars. 

Occasionally, for no reason that I could see, would suddenly alter the angle of their trot. 

For ten minutes ran beneath the squall, raising their arms and, for the first time, shouting and capering. 

bent down, a black cranelike figure, and put his mouth to the ground. 



Figure 3. Part of an exercise on pronouns derived from 
Susanne Corpus, G04, readability grade: 7.3 1 



Select an architectural term from the column on the left. 

Then, from the column on the right, select the definition that corresponds. 

caryatid (either end of the jtrans verse part of a cross-shaped church 

coping curved edge where two vaults meet (in a roof) 

corbel ^ draped statue of a female figure used as a support (eg a pillar) in a building 

Corinthian high, narrow, pointed arch or window 

cornice line of (sometimes overhanging) stonework or brickwork on top of a wall 

Doric column in ancient Greek architecture, with a decoration of leaves on the capital 



Figure 4. Part of a specialist vocabulary exercise on architecture from OALD 



Evaluation 

The format of the exercise is popular: students have no difficulty in using it and are generally enthusiastic. 
The feedback given to them is clear and they have no problem in monitoring their progress. What is not clear 
from the evaluation so far is to what degree there is a correlation between readability grade and exercise 
difficulty. For certain linguistic skills, such as use of articles, students' ability in the skill seems not to vary with 
readability grade of the text. This may not matter, but it requires further investigation. Other exercises, which 
were difficult to classify initially, such as differentiating word pairs, (early, soon) (imply, infer) (constantly 
continually) (anticipate, expect) seem to be uniformly difficult even for native speakers. However, testing is 
incomplete: in particular, tests involving students with more widely differing levels of ability are a priority. 
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